Automatic Tag Attachment Scheme based on Text Clustering for Efficient File Search in Unstructured Peer-to-Peer File Sharing Systems
نویسندگان
چکیده
In this paper, the authors address the issue of automatic tag attachment to the documents distributed over a P2P network aiming at improving the efficiency of file search in such networks. The proposed scheme combines text clustering with a modified tag extraction algorithm, and is executed in a fully distributed manner. Meanwhile, the optimal cluster number can also be fixed automatically through a distance cost function. We have conducted experiments to evaluate the accuracy of the proposed scheme. The result of experiments indicates that the proposed approach is capable of making effective and efficient tag attachment in real scenarios; i.e., for more than 90% of documents, it attaches the same tags as the ones attached by human reviewers. Moreover, it proofs by the experiments that the optimal cluster number is almost the same as the number of topics from the website.
منابع مشابه
Popularity-Based Replication Strategy in Unstructured P2P File Sharing Systems
Peer-to-Peer (P2P) networks have shown to be an efficient and successful mechanism for file sharing over the internet. The unstructured P2P systems usually use a blind search method to find the requested data object. Observations have shown that a few of peers share most of data. In order to increase the success rate of blind search and data availability and load balancing, replication techniqu...
متن کاملLightFlood: an Efficient Flooding Scheme for File Search in Unstructured Peer-to-Peer Systems
“Flooding” is a fundamental operation in unstructured Peer-to-Peer (P2P) file sharing systems, such as Gnutella. Although it is effective in content search, flooding is very inefficient because it results in a great amount of redundant messages. Our study shows that more than 70% of the generated messages are redundant for a flooding with a TTL of 7 in a moderately connected network. Existing e...
متن کاملEfficient Music Genre Retrieval Based on Peer Interest Clustering in P2P Networks
Content-based music retrieval is desirable in Peer-to-Peer (P2P) networks, considering its popularity for users and its ability of semantic search, intensive computing cost raises a barrier to efficiency and scalability though. In this paper, we propose an approach of music genre retrieval based on peer interest clustering. Automatic music feature extraction and adaptive shared music file clust...
متن کاملP2P Network Trust Management Survey
Peer-to-peer applications (P2P) are no longer limited to home users, and start being accepted in academic and corporate environments. While file sharing and instant messaging applications are the most traditional examples, they are no longer the only ones benefiting from the potential advantages of P2P networks. For example, network file storage, data transmission, distributed computing, and co...
متن کاملConnectivity Based Node Clustering in Decentralized Peer-to-Peer Networks
Connectivity based node clustering has wide ranging applications in decentralized Peer-to-Peer (P2P) networks such as P2P file sharing systems, mobile ad-hoc networks, P2P sensor networks and so forth. This paper describes a Connectivity-based Distributed Node Clustering scheme (CDC). This scheme presents a scalable and an efficient solution for discovering connectivity based clusters in peer n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. UCS
دوره 18 شماره
صفحات -
تاریخ انتشار 2012